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In this study, we examined yeast proteins by two-dimensional (2D) gel electrophoresis and gathered quan- 
titative information from about 1,400 spots. We found that there is an enormous range of protein abundance 
and, for identified spots, a good correlation between protein abundance, mRNA abundance, and codon bias. 
For each molecule of well- translated mRNA, there were about 4,000 molecules of protein. The relative 
abundance of proteins was measured in glucose and etbano) media. Protein turnover was examined and found 
to be insignificant for abundant proteins. Some phosphoproteins were identified. The behavior of proteins in 
differential centrifugal, ion experiments was examined. Such experiments with 2D gels can give a global view of 
the yeast proteome. 



The sequence of the yeast genome has been determined (9). 
More recently, the number of mRNA molecules for each ex- 
pressed gene has been measured (27, 30). The next logical level 
of analysis is that of the expressed set of proteins. We have 
begun to analyze the yeast proteome bv using two-dimensional 
(2D) gels. 

2D gel electrophoresis separates proteins according to iso- 
electric point in one dimension and molecular weight in the 
other dimension (21), allowing resolution of thousands of pro- 
teins on a single gel. Although modern imaging and computing 
techniques can extract quantitative data for each of the spots in 
a 2D gel, there are only a few cases in which quantitative data 
have been gathered from 2D gels. 2D gel electrophoresis is 
almost unique in its ability to examine biological responses 
over thousands of proteins simultaneously and should there- 
fore allow us a relatively comprehensive view of cellular me- 
tabolism. 

We and others have worked toward assembling a yeast pro- 
tein database consisting of a collection of identified spots in 2D 
gels and of data on each of these spots under various condi- 
tions (2, 7, 8, 10, 23, 25). These data could then be used in 
analyzing a protein or a metabolic process. Saccharomyces 
cerevisiae is a good organism for this approach since it has a 
well-understood physiology as well as a large number of mu- 
tants, and its genome has been sequenced. Given the sequence 
and the relative lack of introns in S. cerevisioe, it is easy to 
predict the sequence of the primary protein product of most 
genes. This aids tremendously in identifying these proteins on 
2D gels. 

There are. three pillars on which such a database rests: (i) 
visualization of many protein spots simultaneously, (ii) quan- 
tification of the protein in each spot, and (iii) identification of 
the gene product for each spot. Our first efforts at visualization 
and identification for S. cerevisiae have been described else- 
where (7, 8). Here we describe quantitative data for these 
proteins under a variety of experimental conditions. 

MATERIALS AND METHODS 
Strains nod mrdia. S. terevisine W303 [MATa a<te2-l hu3-U.J5 tcu2 5, 1)2 
trphl ura3-l canJ-JOO) was used (26). - Met YNB (yeast nitrogen base) medium 
was 1.7 g of YNB (Difco) per liiei, 5 u of ammonium sulfate per tiler, and 
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adenine, uracil, and all amino acids except methionine; -Met -Cys YNB me- 
dium was the same but without methionine or cysteine. Medium was supple- 
mented with 27c glucose (for most experiments) or with 2% elhanol (for elhanol 
experiments). Low-phosphate YEPD was described by Warner (28). 

Jsotopic labeling of yeast and preparation of cell extracts. Yeast strains were 
labeled and proteins were extracted as described by Garrels el al. (7, 8). Briefly, 
cells were grown to 5 X 10* cells per mL at 30*C; J ml of culture was transferred 
to a fresh tube, and 03 mCi of ( 35 S)meihionine (e.g. Express protein labeling 
mix; New England Nuclear) was added to this I -ml culture. The cells were 
incubated for a further 10 to 15 roin and then transferred to a 1.5- ml microcen- 
trifuge tube, chiDcd on ice, and harvested by centrifugation. The supernatant was 
removed, and the cell pellet was resuspended in 100 pJ of lysis buffer (20 mM 
Tm-HCl [pH 7.6}, 10 mM NaF, 10 mM sodium pyrophosphate, 0.5 mM EDTA, 
0.1% deoxycholate: just be/ore use, phcnylmethylsulfonyl fluoride was added to 
1 mM, leupeptm was added to J p.g/ml, pepstatin was added to 1 ng/mL tosyl- 
sulfonyl phenylalanyl chloromethyl ketone was added to 10 pg/ml, and soybean 
trypsin inhibitor was added to 10 u-g/ml). 

The resuspended cells were transferred to a screw-cap 1.5-mI polypropylene 
tube containing 0.28 g of glass bends (0.5-mm diameter; Biospec Products) or 
0.40 g of zirconia beads (0_>rom diameter; Biospec Products). After the cap was 
secured, the tube was inserted into a MiniBeadbeaier 8 (Biospec Products) and 
shaken at medium high speed at 4"C for 1 min. Breakage was typically 75%. 
Tubes were then spun in a microcentrifuge for 10 s at 5,000 X g at 4°C. 

With a very fine pipette tip, liquid was withdrawn from the beads and trans- 
ferred to a prechilled 1.5-rol tube containing 7 ul of DNase I (0.5.mg/ml; Cooper 
product no. 6330)-RNase A (0.25 mg/ml; Cooper product no. 5679>-Mg (50 mM 
MgCK) mix. Typically 70 p.) of liquid was recovered. The mixture was incubated 
on ice for 10 min to allow the RNase and DNase to work. 

Next, 75 u.1 of 2x dSDS (2X dSDS is 0.6% sodium dodecyl sulfate [SDS], 2% 
mercaploethanol, and 0.1 M Tris-HCI |pH 8)) was added. The tube was plunged 
into boiling water, incubated for 1 min, and then plunged into ice. After cooling, 
the tube was centrifuged at 4°C for 3 min at 14.000 X g. The supernatant was 
transferred to a fresh tube and frozen al -70X. About 5 ul of this supernatant 
was used for each 2D gel. 

2D polymery lamidr gels. 2D gels were made and run as described elsewhere 
(6-8). 

Jmagr analysis of the gels. The Quesi U software system was used for quan- 
titative image analysis (20, 22). Two techniques were used to collect quantitative 
data for analysis by Quest II software. First, before the advent of phosphorim- 
agcrs, gels were dried and fluorographed. Each gel was exposed to film for thjee 
different times (typically 1 day, 2 weeks, and 6 weeks) to increase the dynamic 
range of the data. The films were scanned along with calibration snips to relate 
film optical density to disintegrations per minute in the gels and analyzed by the 
software to obtain a linear relationship between disintegrations per minute in the 
spots and optical densities of the film images. The quantitative data are ex- 
pressed as parts per million of the total cellular protein. This value is calculated 
from the disintegrations per minute of the sample loaded onto the gel and by 
comparing the film density of each data spot with density of the film over the 
calibration strips of known radioactivity exposed to the same film. This yields the 
disintegrations per minute per millimeter for each spot on the gel and, thence its 
parts- per- minute value. 

After the advent of phosphnrimaging : gels bearing ?5 S-Iabeled proteins were 
exposed to phosphorimagcr screens and scanned by a Fuji phosphor imager, 
rvpicallv fot two exposures per gel. Calibration strips of known radioactivity were 
exposed simultaneously. Scan data from the phosphor imager was assimilated by 
Quest II software, and quantitative data were recorded for the spots on the gels. 
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Measurements of protein turnover. Cells in exponential phase were puke- 
labeled with [ 55 S]mcthionine, excess cold Met and Cys were added, and samples 
of equal volume were taken from the culture at intervals up to 90 min (in one 
experiment) or up to 160 min (in a second experiment). Incorporation of 35 S into 
protein was essentially 100% by the first sample (JO min). Extracts were made, 
and equal fractions of the samples were loaded on 2D gels (i.e., the different 
samples had different amounts of protein but equal amounts of ^S). Spots were 
quantitated with a phosphor imaging and Quest software. 

The software was queried for spots whose radioactivify decreased through the 
time course. The algorithm examined all data points for all spots, drew a best-fit 
line through the data points, and looked for spots where this line had a statis- 
tically significant negative slope. Jn one of the experiments, there was one such 
spot. To the eye, this was a minor, unidentified spot seen onfy in the first rwo 
samples (10 and 20 min). In the other experiment, the Quest software found no 
spots meeting the criteria. Therefore, we concluded that none of the identified 
spots (and all but one of the visible spots) represented proteins with long 
half-lives. 

Centrifugal fractionation. Cells were labeled, harvested, and broken with glass 
beads by the standard method described above except that no detergent (i.e., no 
deoxycholate) was present in the h/sis buffer. The crude rysate was cleared of 
unbroken cells and large debris by centrifugal ion at 300 X g for 30 s. The 
supernatant of this centrifugal ion was then spun at 16,000 X g for 10 min to give 
the pellet used for Fig. 6B. Tbe supernatant of the 16,000 X g, JO-min spin was 
then spun at 100,000 X g for 30 min to give the supernatant used for Fig. 6A. 

Protein abundance calculations. A haploid yeast cell contains about 4 X 10~ 12 
g of protein (1, 15). Assuming a mean protein mass of 50 kDa, there are about 
50 X 10* molecules of protein per cell. There are about 1.8 methionines per 10 
kDa of protein mass, which implies 4.5 X 10 8 molecules of methionine per cell 
(neglecting the small pool of free Met). We measured (i) the counts per minute 
in each spot on the 2D gels, (ii) the total number of counts on each gel (by 
integrating counts over the entire gel), and (iii) the total number of counts 
loaded on the gel (by scintillation counting of the original sample). Thus, w C 
know what fraction of the total incorporated radioactivity is present in each spot. 
After correcting for the methionine (and cysteine (see below]) content of each 
protein, we calculated an absolute number of protein molecules based on the 
fraction of radioactivity in each spot and on 50 x 10 6 total molecules per cell. 

The labeling mixture used contained about one-fifth as much radioactive 
cysteine as radioactive methionine. Therefore, tbe number of cysteine molecules 
per protein was also taken into account in calculating the number of molecules 
of protein, but Cys molecules were weighted one-fifth as heavily as Mel mole- 
cules. 

mRNA abundance calculations. For estimation of mRNA abundance, we used 
SAGE (serial analysis of gene expression) data (27) and Alfymeirix chip hybrid- 
ization data (29a ; 30). The mRNA column in Table J .shows mRNA abundance 
calculated from SAGE data alone. However, the SAGE data came from cells 
gmwing in YEPD medium, whereas our protein measurements were from cells 
growing in YNB medium. In addition, SAGE data for low-abund;mce mRNAs 
suffers from statistical variation. Therefore, we also used chip hybridization data 
(29a, 30) for mRNA from cells grown in YNB. These hybridization data also had 
disadvantages. First, ihe amounts of high-abundance mRNAs were systemati- 
cally underestimated, probably because of saturation in (he hybridizations, which 
used 10 ug of cRNA. For example, the abundance of ADH1 mRNA was 197 
copies per cell by $AGE but only 32 copies per ceil by hybridization, and the 
abundance of ENQ2 mRNA was 248 copies per cell by SAGE but only 41 by 
hybridization. When the amount of cRNA used in the hybridization was reduced 
to 1 ug, the apparent amounts of mRNA were similar to the amounts determined 
by SAGE (29a, 29b). However, experiments using 1 u.g of cRNA have been done 
for only some genes (29a). Because amounts of mRNA were normalized to 
15 ; 000 per cell, and because the amounts of abundant mRNAs were underesti- 
mated., there is a 2.2- fold overestimate of . the abundance of nonabundant 
mRNAs. We calculaied ihis factor of 72 by adding together the number of 
mRNA molecules from a large number of genes expressed at a low level for boih 
SAGE data and hybridizaiion data. The sum for ihe same genes from hybrid- 
ization data is 2.2-fold greater than that from SAGE data. 

To take into account these difficulties, we compiled a list of "adjusted* 1 mRNA 
abundance as follows. For all high- abundance mRNAs of om identified proteins, 
we used SAGE data. For all of these particular mRNAs, chip hybridization 
suggested thai mRNA abundance w as ihe same in YEPD and YNB media. For 
medium-abundance mRNAs, SAGE data were used., bui when hybridizaiion 
data showed a significant difference between YEPD and YNB, then the SAGE 
data were adjusted by the appropriate facior. Finally, for low-abundance 
mRNAs, we used data from chip hybridizations from YNB medium but divided 
by 2.2 to normalize to the SAGE results. These calculations were completed 
without reference to protein abundance. 

CAJ. The codon adaptation index (CAl) was taken from ihe yeast protcome 
database (YPD) (13), for which calculations were made according to Sharp and 
Li (24). Briefly, the index uses a reference set of highly expressed genes ip assign 
a value to each codon. and then a score lor a'gcnr « calculated from the 
frequency ol use of the vjrious codons in \ hat gene (24 ). 

Statistical analysis. The .'MP program was used with ihe aid of T. Tullv. The 
JMP program showed that neither mRNA not pvoiein abundances «erc nm- 
mjlly disiribtiied: thcicloitf, Spearman rank cm r clarion coefficients (/.) were 
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calculated. The mRNA (adjusted and unadjusted) and protein, data were also 
transformed so that Pearson product- moment correlation coefficients (r ) could 
be calculated. First, this was done by a Box-Cox transformation of log-trans- 
formed data. This transformation produced normal distributions, and an r p of 
0.76 was achieved. However, because the Box-Cox transformation is complex? we 
also did a simpler logarithmic transformation. This produced a normal distribu- 
lion for the protein data. However, the distribution for the mRNA and adjusted 
mRNA data was close to, but not quite, normal. Nevertheless, we calculated the 
r p and found that it was 0.76, identical to tbe coefficient from the Box-Cox 
transformed data. We therefore believe thai this correlation coefficient is not 
misleading, despite Ihe fact that the log(mRNA) distribution isnol quite normal. 



RESULTS 

Visualization or lylOO spots on three gel systems. Yeast 
proteins have isoelectric points ranging from 3.1 to 12.8, and 
masses ranging from Jess than 10 kDa to 470 kDa. It is difficult 
to examine all proteins on a single kind of gel, because a gel 
with the needed range in pi and mass would give poor resolu- 
tion of the thousands of spots in the central region of the gel. 
Therefore, we have used three gel systems: (i) pH "4 to 8* with 
10% polyacrylamide; (ii) pH "3 to 10" with 10% poryacryl- 
amide; and (iii) noneauilibrium with 15% polyacrylamide (7, 
8). Each geJ system allows good resolution of a subset of yeast 
proteins. 

Figure 1 shows a pH 4-8, 10% polyacrylamide gel. the pH 
at tbe basic end of the isoelectric focusing gel cannot be main- 
tained throughout focusing, and so the proteins resolved on 
such gels have isoelectric points between pH 4 and pH 6.7. For 
these pH 4-8 gels, we see 600 to 900 spots on the best gels after 
multiple exposures. 

The pH 3-10 gels (not shown) extend the pi range somewhat 
beyond pH 7-5, allowing detection of several hundred addi- 
tional spots. Filially, we use nonequilibrium gels with 15% 
acrylamide in the second dimension. These allow visualization 
of about 100 very basic proteins and about 170 small proteins 
(less than 20 kDa). In total, using all three gel systems, about 
1,400 spots can be seen. These represent about 1,200 different 
proteins, which is about one-quarter to one-third of the pro- 
teins expressed under these conditions (27, 30). Here, we focus 
on the proteins seen on the pH 4-8 gels. 

Although nearly alJ expressed proteins are present on these 
gels, the number seen is limited by a problem we call coverage. 
Since there are thousands of proteins on each gel, many pro- 
teins comigrate or nearly comigrale. When two proteins are 
resolved, but are close together, and one protein spot is much 
more intense than the other, a problem arises in visualizing the 
weaker spot: at long exposures when the weak signal is strong 
enough for detection, the signal from the strong spot spreads 
and covers the signal from the weaker spot. Thus, weak spots 
can be seen only when they are well separated from strong 
spots. 

For a given gel, the number of delectable spots initially jiscs 
with exposure time. However, beyond an optimal exposure, the 
number of distinguishable spots begins to decrease, because 
signals from strong spots cover signals from nearby weak spots. 
At long exposures, the whole autoradiogram turns black. Thus, 
there is an optimum exposure yielding the maximum number 
of spots, arid at this exposure the weakest spots are not seen. 

Largely because of the problem of coverage, the proteins 
seen are strongly biased toward abundant proteins. AlJ identi- 
fied proteins have a CAl of 0.18 or more, and we have iden- 
tified no transcription factors or protein kinases, which are 
nonabundant proteins. Thus, this technology is useful for ex- 
amining protein synthesis, amino acid metabolism, and glyco- 
lysis but not for examining tr .inscription, DNA replication, or 
the cell cycle. 
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Spot identification. The identification of various spots has 
been described elsewhere (7, 8), At present, 169 different spots 
representing J 48 proteins have been identified. Many of these 
spots have been independently identified (2, 10, 23, 25). The 
main methods used in spot identification have been analysis of 
amino acid composition, gene overexpression, peptide se- 
quencing, and mass spectrometry. 

Pulse-chase experiments and protein turnover. Pulse-chase 
experiments were done to measure protein half-Jives (Materi- 
als and Methods). Cells were labeled with [ 35 S]methionine for 
10 min. and then an excess of unlabeled methionine was added. 
Samples were taken at 0, 10, 20, 30, 60, and 90 min after the 
beginning of the chase. Equal amounts of 35 S were loaded from 
each sample; 2D gels were run, and spots were quantitated. 
Surprisingly, almost every spot was nearly constant in amount 
of radioactivity over the entire time course (not shown). A few 
spots shifted from one position to another because of post- 
translationaJ modifications (e.g., phosphorylation of RpaO and 
Efbl). Thus, the proteins being visualized are all or nearly all 
very stable proteins, with half-lives of more than 90 min. Gygi 
et a). (10) have come to a similar conclusion by using the N-end 
rule to predict protein half-lives. This result does not imply 
that all yensl proteins are stable. The proteins being visualized 
are abundant proteins; this is partly because they are stable 
proteins. 

Protein quantitation. Because all of the proteins seen had 
effectively the same half-life, the abundance of each protein 
was directly proportional to the amount of radioactivity incor- 
porated during labeling. Thus, after taking into account the 
total number of protein molecules per cell, the average content 
of methionine and cysteine^ and the methionine and cysteine 
content of each identified protein, we could calculate the abun- 
dance of each identified protein (Tables 1 and 2; Materials and 
Methods). About 1,000 unidentified proteins were also quan- 
tified, assuming an average content of Met and Cys. 

Many proteins give multiple spots (7, 8). The contribution 
from each spot was summed to give the total protein amount. 
However, many proteins probably have minor spots that we are 
not aware of, causing the amount of protein to be underesti- 
mated. 

When the proteins on a pH 4-8 gel were ordered by abun- 
dance, the most abundant protein had 8,904 ppm, the 10th 
most abundant had 2,842 ppm, the 100th most abundant had 
314 ppm, the 500th most abundant had 57 ppm, and the 
1,000th most abundant (visualized at greater than optimum 
exposure) had 23 ppm. Thus, there is more than a 300-fold 
range in abundance among the visualized proteins. The most 
abundant 10 proteins account for about 25% of the total pro- 
tein on the pH 4-8 gel, the most abundant 60 proteins account 
for 50%, and the most abundant 500 proteins account for 80%. 
Since it seems likely that the pH 4-8 gels give a representative 
sampling of all proteins, we estimate that half of the total 
cellular protein is accounted for by fewer than 100 different 
gene products, principally glycolytic enzymes and proteins in- 
volved in protein synthesis. 

Correlation of protein abundance with mRNA abundance. 
Estimates of mRNA abundance for each gene have been made 
by SAGE (27) and by hybridization of cRNA to oligonucleo- 
tide arrays (30). These two methods give broadly similar re- 
sults, yet each method has strengths and weaknesses (Materials 
and Methods). Table 1 lists the number of molecules of mRNA 
per cell for each gene studied. One measurement (mRNA) 
uses data from SAGE analysis alone (27); a second incorpo- 
rates data from both SAGE and hybridization (30) (adjusted 
mRNA) (Table J; Materials and Methods). We correlated 
protein abundance with mRNA abundance (Fig. 2). 'For ad- 




Mol Cell Biol. 

justed mRNA versus protein, the Spearman rank correlation 
coefficient, r„ was 0.74 (P < 0.0001), and the Pearson corre- 
lation coefficient, r py on log transformed data (Materials and 
Methods) was 0.76 (P < 0.00001). We obtained similar corre- 
lations for mRNA versus protein and also for other data trans- 
formations (Materials and Methods). Thus, several statistical 
methods show a strong and significant correlation between 
mRNA abundance and protein abundance. Of course, the cor- 
relation is far from perfect; for mRNAs of a given abundance, 
there is at least a 10-fold range of protein abundance (Fig. 2). 
Some of this scatter is probably due to posttranscriptional 
regulation, and some is due to errors in the mRNA or protein 
data. For example, the protein Yef3 rims poorly on our gels, 
giving multiple smeared spots. Its abundance has probably 
been underestimated, partly explaining the low protein/mRNA 
ratio of Yef3. It is the most extreme outlier in Fig. 2. 

These data on mRNA (27, 30) and protein abundance (Ta- 
ble 1) suggest that for each mRNA molecule, there are on 
average 4,000 molecules of the cognate protein. For instance, 
for Actl (actin) there are about 54 molecules of mRNA per 
ceil and about 205,000 molecules of protein. Assuming an 
mRNA half-Jife of 30 min (12) and a cell doubling time of 120 
min, this suggests that an individual molecule of mRNA might 
be translated roughly 1,000 times. These calculations are lim- 
ited to mRNAs for abundant proteins, which are likely to be 
the mRNAs that are translated best. 

A full complement of cell protein is synthesized in about 120 
min under these conditions. Thus, 4,000 molecules of protein 
per molecule of mRNA implies that translation initiates on an 
mRNA about once every 2 s. This is a remarkably high rate; it 
implies that if an average mRNA bears 10 ribosomes engaged 
in translation, then each ribosome completes translation in 
20 s; if an average protein has 450 residues; this in turn implies 
translation of over 20 amino acids per s, a rate considerably 
higher than estimated for mammalians (3 to. 8 amino acids per 
s) (18). These estimates depend on the amount of mRNA per 
cell (11, 27). 

The large number of protein molecules that can be made 
from a single mRNA raises the issue of how abundance is 
controlled for less abundant proteins. Many nonabundant pro- 
teins may be unstable, and this would reduce the protein/ 
mRNA ratio. In addition, many nonabundant proteins may be 
translated at suboptimal rates. We have found that mRNAs for 
nonabundant proteins usually have suboptimal contexts for 
translalional initiation. For example, there arc over 600 yeast 
genes which probably have short open reading frames in the 
mRNA upstream of the main open reading frame (17a). These 
may be devices for reducing the amount of protein made from 
a molecule of mRNA. 

Correlation of codon bias with protein abundance. The 
mRNAs for highly expressed proteins preferentially use some 
codons rather than others specifying the same amino acid (14). 
This preference is called codon bias. The codons preferred are 
those for which the tRNAs are present in the greatest amounts. 
Use of these codons may make translation faster or more 
efficient and may decrease misincorporation. These effects are 
most important for the cell for abundant proteins, and so 
codon bias is most extreme for abundant proteins. The effect 
can be dramatic— highly biased mRNAs may use only 25 of the 
61 codons. 

We asked whether the correlation of codon bias with abun- 
dance continues for medium-abundance proteins. There are 
various mathematical expressions quantifying codon bias; here, 
we have used the CAJ (24) (Materials and Methods) because 
it gives a result between 0 and I. The r 3 for CA1 versus protein 
abundance is 0.80 (P < 0.0001). similar to the mRNA-protein 





TABLE J. Quantitative data* 



Function 



Name 


CAJ 


mRNA 


Adjusted m 


Adhl . 


0.810 


197 


197 


Adh2 


0.504 


0 




0.185 


1 


2.8 


Eriol 


0.870 


No Nla 


Eno2 


0.892 


248 


248 


Fbal 


0.868 


179 


179 


Hxkl,2 


0.500 


13 


10_5 


Icll 


0.251 


o 


Pdbl 


0.342 


5 


5 


Pdcl 


0.903 


226 


226 


Pfld 


0.465 


5 


5 


Pgil 


0.681 


14 


14 


Pycl 


0.260 


1 


0.7 


Tall 


0.579 


5 


5 


Tdb2 


0.904 


63 


63 


Tdh3 


0.924 


460 


460 


Tpil 


0.817 


No A7a 


Efbl 


0.762 


33 


16.5 


Eft] ,2 


0.801 


26 


26 


Prtl 


0.303 


4 


0.7 


RpaO 


0.793 


246 


246 


Tifl ,2 


6.752 


" 29 


29 


Yef3 


0.777 


36 


36 



Protein (Glu) HO 3 ) 


Pioiein (Eth) (10 3 ) 


F/CZ rat in 


1,230 


972 


0.79 


0 


963 


>20 


23 


288 


12 


410 


974 


2.4 


650 


215 


0.33 


640 


608 




62 


46 




0 


671 


>20 


41 


33 ' 




280 


205 


0.73 . 


75 


53 


0.71 


160 


120 


0.75 


37 


34 




110 


35 




430 


876 


NR 


1,670 


1,927 


NR 


No Met 


No Met 




358 


362 




99 


54 


0o5 


12 


6 


277 
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0.36 


233 
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0.46 
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Carbohydrate metabolism 



Protein synthesis 



Heat shock 



Amino acid synthesis 



Miscellaneous 



Hsc82 
Hsp6G 
Hsp82 
Hspl04 
Kar2 
Ssal 
Ssa2 
SsbU 
. Sscl 
Ssel 
Stil 

Adel 

Ade3 

Ade5,7 

Ajg4 

Gdhl 

Glnl 

His4 

Iiv5 

Lys9 

Met6 

Pn>2 

Serl 

Trp5 

Actl 

AdJcl 

Ald6 

Atp2 

Bmhl 

Bmh2 

Cdc48 

Cdc60 

Erg20 

GppJ 

Gspl 

Jppl 

Lcbl 

Motl 

Pab) 

Psal 

Rm4 

Saml 

Sam2 

Sodl 

Ubal 

YKL056 

YLR109 

YMR116 



0.581 

0J81 

0.517 

0.304 

U.439 

0.709 

t).802. 

0.850 

0.521 

0.521 

0.247 

0.229 

0.276 

0.257 

0.229 

0.585 

0.524 

0.267 

0.80) 

0.332 

0.657 

0.248 

0.258 

0.3J9 

0.710 

0.531 

0.520 

0.424 

0.322 

0.384 

0.306 

0.299 

0.373 

0.603 

0.621 

0.620 

0.173 

0.423 

0.488 

0.600 

0.497 

0.494 

0.497 

0.376 

0.212 

0.731 

0.549 

0.777 
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21 
41 



2.9 
2.3 
1.3 
7 

. 10.1 
4.3 
5 
50 
2.6 
8 

1.1 
4 

1.7 

1.4 

8.1 
27 
11 

3 

6 

4 
22 

3 

1.2 
5 

54 

3 

4.1 
46 
1.4 
2.4 
0.86 
5 
5 
3 
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0.8 
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5 
15 
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112 

35 
52 
70 
'43 

303 
.213 

270 
68 
96 
25 

14 
12 
14 
41 

148 

77 
15 

152 
32 

190 
30 
15 
28 

205 
47 

181 
76 

191 

134 

32 
6 

92 
234 

115 
254 

19 

20 

41 
148 

44 

59 

63 
631 

14 
253 
930 
184 



75 
82 
135 
161 
102 
421 
324 
85 
80 
48 
44 

27 
9 
4 
41 
55 
104 
23 
109 
17 
80 
12 
8 
12 
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. 43 
159 
109 
137 
147 

26 
2 

39 
158 

39 
147 

40 

16 

19 

56 

37 

21 
20 
618 
20 
112 

40 



0.67 

2.3 

2.6 

2.3 

2.4 

1.4 

1.5 

1.2 

1.7 



1.3 

1.5 

0.7 

0.52 

0.42 



0.78 



1.4 

0.72 



0.34 
0.58 



0.47 



0.44 
0.20 



CAJ. a measure of codon bias, 
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methionines and so there tie no reliable protein data. 
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TABLE 2. Functions of proteins listed in Table 1 



Name" 



YPD litle lines* 



Adhl 

Adh2 

Cit2 

Enol 

Eno2 

FbaJ 

Hxki 

Hxk2 

Jell 

Pdbl 

Pdc) 

Pfkl 

Pgil 

Pycl 

Tall 
Tdh2 
Tdh3 
Tpil 

Efbl 
Eftl 
Eft2 
Prt) 

RpaO (RPPO) 

TifJ 

Tif2 

YeD 

Hsc82 
Hsp60 
Hsp82 
HspJ04 

Kar2 

Ssal 
Ssa2 
Ssbl 
Ssb2 
Sscl 

Ssel 
Stil 

Adel 

Ade3 

Ade5,7 

Are4 

GdhJ 

Glnl 

His4. 

Jlo 

Lys9 
Mct6 

Pro2 
ScrJ 
Tjp5 

Acil 

Adkl 

Ald6 

Alp2 

Bmhl 

Bmh2 

Cdc48 

Cdc60 

En>20 

Gp>l (Rhr2) 
Gspl 

IppJ 
Lcbl 

Moll (Thi4) 
Pab) 

Psa] 

Rnr4 

Samt 

Sam2 

Sod! 

UbaJ 



Alcohol dehydrogenase I; cytoplasmic isozyme reducing aceialdehyde 10 ethanol, regenerating NAD* 
Alcohol dehydrogenase II; ox>di2es elhanol to acetaldehyde, glucose repressed * 
Citrate synthase, peroxisomal (nonmitochondrial); converts acetyl-CoA and oxaloacetate to citrate plus CoA 
KiSS I g-phosphogtycerate dehydratase); converts 2-phospho-D-glycerale to pbosphoeoolpyruvate in glycolysis 
" M3^^ 10 P^oeno&te in fcofc 

Hexokinase I; converts hexoses 10 hexose phosphates in glycolysis; repressed by glucose 

Hexokinase I; converts hexoses to hexose phosphates in glycolysis and plays a regulatory role in glucose repression 
hocilrale lyase peroxisomal; carries out part of the glyoxylaie cycle; required for gluconeogenesis repression 
Pyruvate dehydrogenase complex, El beta subunit 5 £ 

Pyruvate decarboxylase isozyme 1 

Pbospbofructokinase alpha subunit, part of a complex with Pfk2p which curies out a key regulatory step in ehrcolysis 
Grucose-6-phosphaie isomerase, converts glucose-6-phosphate .0 fructase-6-phosphale 0 ^ P glyCO ' yS,S 
Pyruvate carboxylase 1: converts pyruvate to oxaloacetate for gluconeogenesis 
Transaldolase; component of nonoxidative part of pentose phosphate pathway 

r^idfKf Ett^f ^ e K f °£ cnasc £ converts D-glycer aldehyde 3-phosphale to U-dephosphogrycerate 
Glyceraldebyde-3-phosphaie dehydrogenase 3; converts D-glyceraldehyde 3-phospbate to l^dephosphofrkerate 
Triphosphate isomerase; mlerconverts gryceraldehyde-3-phospha.e and dihydroxyacelone phosphate 

Translation elongation factor EF-10; GDP/GTP exchange factor for Teflp/Tef2p 

!! ? C Qn ^ a !! on ' acl0x g-J «>n^ins diphthamide which is not essential for activity; identical to Ef I2p 
Trans a ion elongation factor EF-2; contains diphthamide which is not essential for aclrvirJ identical to Eftlp 
Translation initiation factor e!F3 beta subunit (p90); has an RNA recognition domain * " P 

Acidic ribosomal protein AO 

Translation inilialion factor 4 A (eIF4A) of the DEAD box family 
Translation initiation factor 4A (e!F4A) of the DEAD box family 
Translation elongalion factor EF-3A; member of ATP-binding cassette superfamiry 

Chaperonin homologous to E. coli HtpG and mammalian HSP90 

Mitochondrial chaperonin that cooperates with HsplOp; homolog of £ coli GroEL 

Heal- inducible chaperonin homologous to £ coli HtpG and mammalian HSP90 

inJUCCd ,hcrmoto ' e ' ancc and -solubihzing aggregates of denatured proteins; important for (psi"}- 

*" —location across the endoplasmic reticulum membrane 

Cytoplasmic chaperone; heat shock protein of the HSP70 family 
Cytoplasmic chaperone; member of the HSP70 family 
Heal shock prolan of HSP70 family involved in the translalional apparatus 
Heal shock protein of HSP70 family, cytoplasmic 

SrSTi 9 ' " SP 7? famil ^ T l,ico PX SU PP'«*>' of mutants with hvperaclivated Ras/cvclic AMP pathway 
S.t« S -,nd„«d proton rcqmnrd for opltmal growth at high and low temperature;)!* .e.ialricopepl££ repeats P * 

s ^X" ,a,ras ,he stvewh s,ep in -< nw ° pu "" e 

r \T^ C ^ r , 0 * cnasc (NADP*); combines ammonia and a-ketoglutnrale to form glutamule 
Glutamme syn hetase; combines ammonia lo glntamate in ATP-driven reaction 

P W2SS^ P>,ophosphohvdrolase/histidinol dehydrogenase; 2nd, 3rd, and 10t h steps of 

KC and^ add reducioisomerase) (alpha-keu^-hyd.oxyiacyl) reductoisomerase); second step in Val 

Ho^kS fi 7 lin * ("f^OpSj reductase), seventh step m lysine biosvnthesis pathway 

^^^k^^ST"^' (S-meihylietrahydropteroyl Iriglulamate- homocysteine me.hyltransferase), methionine synthase, 7 

7-Glulamyl phosphate reductase (phosphoglulamale dehydrogenase), proline bia-^nthelic en7vme 
Ph(«pb t Kcrine Uansaminase; involved in synthesis of serine from 3-phosphoglyce^aTe " 
1 ryptophan synlhnse, la.si (5lh) step in tryptophan biosynthesis pathway 

^^SS^^i^^^^S" 1 ^ i^'^ and cytoskeletal functions 
Adenylate kinase (GIP:AMP phosphotransferase), cytoplasmic 
Cytosohc acetaldehyde dehydroeenase 

Deia subunil of FJ ATP synlhase; 3 copies are found in each Fl oligomer 
Homo og of mammalian 14-3-3 prolein; has slrong similarily to Bmh2p 
Homolog of mammalian 14-3 3 proiein; has strong similarity to Bmhl p 

MiRM^^c^^ rCqUired ^ di;i5i0n h0m »WC membrane fusion 

T C =^ 1^^^ cmponen, of sphingoids 

^^hSSXKSSs" , ° f CVIOpbSm 3,,d nU *" : P"' ° f ,hc 3 '- J R N A - pr ocessing complex (cleavage facor I); has 4 RNA 

MannoseO phosphaieguanyllransferasc; GDP-mannose pyropho^phonlase 
Ribonucleotide ieducia.se small subunit F y ' 

5-Adenos}'lmeihionine iynihetase 1 
5-Adenosylmeihionine synthetase 2 
Copper -zinc supeioxide dismutase 
Ubiquiiin-aciivaiing (El) enzvme 



YMR116 (Ascl) Abundant protein with effects on Iranslalional efficiency and cell size, has two WD (WD-40) repeats 



- Accepted name Irom ihe Socchonmyces genome Aitabase and YPD. Names in paienihese.s represent recen, chan C es 
"Couiir.sy of Proiciin)e : Inc.. icpnnird wiih permission. 1 tnan^es. 
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FIG. 2. Correlation of protein abundance with adjusted mRNA abundance. 
The number of molecules per cell of each protein is plotted against the number 
of molecules pej cell of the cognate mRNA, with an r p of .0.76. Note the 
logarithmic axes. Data for mRNA were taken from references 27 and 30 and 
combined as described in Materials and Methods. 



correlation, confirming a strong correlation between CAI and 
protein abundance (Fig. 3). The relationship between CAI and 
protein abundance is Jog linear from about 1,000,000 to about 
10,000 molecules per cell. We have no data for rarer proteins. 

It is not clear whether CAJ reflects. maximum ot average 
levels of protein expression. The proteins used for the CAl- 
protein. correlation included some proteins which were not 
expressed at maximum levels under the condition of the ex- 
periment (Hsc82, Hspl04, Ssal, Adel, Arg4, His4, and others). 
When these proteins were removed from consideration and 
the correlation between CAI and the remaining (presumably 
constitutive) proteins was recalculated, the r s was essentially 
unchanged (not shown). 

The equation describing the graph in Fig. 3 is log (protein 
molecules/cell) = (2.3 X CAI) + 3.7. Thus, under certain 
conditions (a CAI of 0.3 or greater; a coostiiutively expressed 
gene), a very rough estimate of protein abundance can be 
made by raising 10 to the power of [(2.3 X CAI) + 3.7]. 

The distribution of CAJ over the genome (Fig. 4) consists of 
a lower, bell-shaped distribution, possibly indicating a region 
where there is no selection for codon bias, and an upper, flat 
distribution, starting at a CAI of about 0.3, possibly indicating 
a region where there is selection for codon bias. Almost ail of 
the proteins whose abundance we have measured are in the 
upper, flat portion of the distribution. In the lower, bell-shaped 
region, we do not know whether there is a correlation between 
CAJ and protein abundance. 

Changes in protein abundance in glucose and ethaool. A 
comparison of cells grown in glucose (Fig. I A) with cells grown 
in ethanoJ (Fig. IB) is shown in Table ]. As is well known, 
some proteins are induced tremendously during growl b on 
ethanol. Two striking examples are the peroxisomal enzymes 
Jell (isocitrate lyase) and Ol2 (citrate synthase), which are 
induced in ethanol by more than 100- and 12-fold, respectively 
(Fig. 1; Table 1). These enzymes are key components of the 
glyoxylate shunt, which diverts some acetyl coenzyme A 
(aceryl-CoA) from the tricarboxylic acid cycle to glnconeogen- 
esis. S. cerevisiac requires large amounts of carbohydrate for its 
cell wall; in ethanol medium, this carbohydrate comes from 
gluconeogenesis, which depends on the glyoxylate shunt and 
on the glycolytic pathway miming in reverse. The need for 



gluconeogenesis also explains why glycolytic enzymes are 
abundant even in ethanol medium. Thus, 2D gel analysis shows 
the prominence of the glycolytic and glyoxylate shunt enzymes 
in cells grown on ethanol, emphasizing that gluconeogenesis, 
presumably largely for production of the cell wall, is a major 
metabolic activity under these conditions. 

During gluconeogenesis, substrate-product relationships are 
reversed for the glycolytic enzymes. One might expect that not 
all glycolytic enzymes would be well adapted to the reverse . 
reaction. Indeed, 2D gels show that in ethanol, Adh2 (alcohol 
dehydrogenase 2) is strongly induced (16), while its isozyme 
Adhl is not greatly affected. Adhl and Adh2 each interconvert 
acetaldehyde and ethanol. Adhl has a relatively high K m for 
ethanol (17 mM), while Adh2 has a lower K m (0.8 mM) (5). 
Thus, it is thought that Adhl is specialized for glycolysis (ac- 
etaldehyde to ethanol), while Adh2 is specialized for respira- 
tion (ethanol to acetaldehyde) (5, 29). Similarly, Enol (enoJase 
1) is induced in ethanol, while its isozyme Eno2 (enolase 2) 
decreases in abundance (Table 1) (4, 19). Enol is inhibited by 
2-phospboglycerate (the glycolytic substrate), while Eno2 is 
inhibited by phosphoenolpyruvate (the gluconeogenic sub- 
strate) (4). Perhaps Enol has a lower K m for phosphoenol- 
pyruvate than does Eno2, though to our knowledge this has not 
been tested. Thus, the 2D gels distinguish isozymes specialized 
for growth on glucose (Adhl and Eno2) from isozymes spe- 
cialized for ethanol (Adh2 and Enol). 

Many heat shock proteins (e.g., Hsp60, Hsp82, Hspl04, and 
Kar2) were about twofold more abundant in ethanol medium 
than in glucose medium. This is consistent with the increased 
heat resistance of cells grown in ethanol (3). 

Enzymes involved in protein synthesis (Eftl, RpaO, and Tifl) 
were about twice as abundant in glucose medium as in ethanol 
medium. This may reflect the higher growth rate of the cells in 
glucose; 

Phosphorylation of proteins. To examine protein phosphor- 
ylation, we labeled cells with 32 P and ran 2D gels to examine 
phosphoproteins. About 300 distinct spots, probably represent- 
ing 1 50 to 200 proteins, could be seen on pH 4-8 gels (Fig. 5B). 
We then aligned autoradiograms of three gels, each with a 
different kind of labeled protein ( 32 P only [Fig. 5B], 32 P plus 
35 S (Fig. 5A], and 35 S only [not shown, but see Fig. 1 for . 
example]). In this way, we made provisional identification of 
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FIG. 3. Conelation of protein abundance with CAI. The number of mole- 
cules pcrcelt of each protein is plotted against the CAJ for thai protein. Note the 
logarithmic scale on the protein axis. Data for the CAI ate from the YPD 
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FIG. 4. Distribution of CA1 ovci the whole genome, shown in intervals of 0.030 (i.e., there are 150 genes with a CA1 between 0.000 and 0.030, inclusive; 31 genes 
with a CA1 between 0.031 and 0.060; 269 genes with a C>\1 between 0.061 and 0.090; 1,296 genes with a CA) between 0.091 and 0.120; etc.). The distribution peaks 
with 2,028 genes with a CA1 between 0.121 and 0.150. 



some of the 32 P-labeJed spots as particular 35 S-Jabeled spots. 
AJI such identifications are somewhat uncertain, since precise 
alignments are difficult, and of course multiple spots may ex- 
actly comigrate. Nevertheless, we believe that most of the 
provisional identifications are probably correct. Among the 
major 32 P-labeled proteins are the bexokinases Hxkl and 
Hxx2. the acidic ribosome-associated protein RpaO, the trans- 
lation factors Yef3 and Efbl, and probably Hsp70 heat shock 
proteins of the Ssa and Ssb families. RpaO and EfbJ are quan- 
titatively monophosphorylated. 

Many yeast proteins resolve into multiple spots on these 2D 
gels (7). Yef3 has five or more spots, at least four of which 
comigrate with 32 P. Tpil has a major spot showing no 32 P 
labeling and a minor, more acidic spot which overlaps with 
some 32 P label. Tifl has at least seven spots (7); two of these 
overlap with some 32 P label, but five do not (Fig. 5). Eftl has 
at least three spots (7), and none of these overlap with 32 P, 
although there are three nearby, unidentified 37 P- labeled spots 
(a, c, and d in Fig. 5). Spots that seem to be extra forms of 
Met6, Pdc], Eno2, and Fbal can be seen in Fig. 6A, but there 
is little 32 P at these positions in Fig. 5. Thus, phosphorylation 
explains some but not all of the different protein isoforins seen. 

The cell cycle is regulated in part by phosphorylation. We 
compared 32 P-labeJed proteins from cells synchronized in G, 
with a-factor, in cells synchronized in by depletion of G, 
cyclins, and in cells synchronized in M phase with nocodazole. 
Only very minor differences were seen, and these were difficult 
lo reproduce. The cell cycle proteins regulated by phosphory- 
lation may not be abundant enough for this technique to be 
applied easily. 

Centrifugal fractionation. We fractionated 35 S-labeIed ex- 
tracts by cenirifugation (Materials and Methods). Figure 6A 
shows the proteins in the supernatant of a high-speed 
(100,000 X g. 30 min) cenirifugation, while Fig. 613 shows the 
proteins in the pellet of a low-speed (16.000 X g f 10 min) 
centrifugation. Many proteins are tremendously enriched in 
one fraction or the other, while others arc present in both. 



Most glycolytic enzymes (e.g^, Tdh2, Tdh3, Eno2, Pdcl, Adhl, 
and Fbal) are enriched in the supernatant fraction. The only 
exception is Pfkl (not indicated), which is found in both pellet 
and supernatant fractions. Many proteins involved in protein 
synthesis (Eftl, Yef3, Prtl, Tifl, and RpaO) are in the pellet, 
possibly because of the association of ribosomes with the en- 
doplasmic reticulum. However, Efbj is in the supernatant, as is 
a substantial portion of the Eftl. Perhaps surprisingly, several 
mitochondrial proteins (Atp2 [not shown} and Uv5) are largely 
in the supernatant. Perhaps glass bead breakage of cells re- 
leases mitochondrial proteins. The nuclear protein GspJ is in 
the pellet fraction. The enrichment produced by cenirifugation 
makes it possible to see minor spots which are otherwise poorly 
resolved from surrounding proteins. Figure 6B shows that the 
previously identified Tifl spot is surrounded by as many as six 
other spots that cofractionate. We observed six identical or 
very similar additional spots when we overexpressed Tifl from 
a high-copy-number plasmid (not shown). Signal overlaps only 
one or two of these spots in 32 P-labeJing experiments (Fig. 5), 
and so the different forms are not mainly due lo different 
phosphorylation states. 

DISCUSSION 

Our experience with developing a 2D gej protein database 
for S. cerevisiae is summarized here. With current technology, 
we can see the most abundant 1,200 proteins, which is about 
one-third to one-quarter of the proteins expressed. The re- 
maining proteins will be difficult to see and study with the 
methods that we have used, not because of a lack of sensitivity 
but because weak spots are covered by nearby strong spots. 

Of the 1,200 proteins seen, we have identified 148, with a 
bias toward the most abundant proteins. Steady application of 
the methods already used would allow identification of most of 
the remaining proteins. Gene overexpression will be particu- 
larly useful, since it is not affected by the lower abundance of 
the remaining visible proteins. 
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2D gels of the kind that we have used are not suitable for 
visualization of rare proteins. However it will be possible to 
study on a global basis metabolic processes involving relatively 
abundant proteins, such as protein synthesis, glycolysis, gh> 
coneogenesis, amino acid synthesis, cell wall synthesis, nucle- 
otide synthesis, lipid metabolism, and the heat shock response. 

Gygi et ah (10) have recently completed a study similar to 
ours. Despite generating broadly similar data, Gygi et al. 
reached markedly different conclusions. We believe that both 
mRNA abundance and codon bias are useful predictors of 
protein abundance. However, Gygi et al. feel that mRNA 
abundance is a poor predictor of protein abundance and that 
"codon bias is not a predictor of either protein or mRNA 
levels" (10). These different conclusions are partly a matter of 
viewpoint. Gygi et al. focus on the fact that the correlations of 
mRNA and codon bias with protein abundance are far from 
perfect, while we focus on the fact that, considering the wide 
range of mRNA and protein abundance and the undoubted 
presence of other mechanisms affecting protein abundance, 
the correlations are quite good. 

However, the different conclusions are also partly due to 
different methods of statistical analysis and tQ real differences 
in data. With respect to statistics, Gygi et al. used the Pearson 
product-moment correlation coefficient (r p ) to measure the 
covariance of mRNA and protein abundance. Depending on 
the subset of data included, their r p values ranged from 0.1 to 
0.94. Because of the low r p values with some subsets of the 
data, Gygi et al. concluded that the correlation of mRNA to 
protein was poor. However, the r p correlation is a parametric 
statistic and so requires variates following a bivariate normal 
distribution; that is, it would be valid only if both mRNA and 
protein abundances were normally distributed. In fact, both 
distributions are very far from normal (data not shown), and so 
a calculation of r p is inappropriate. There was no statistical 
backing for the assertion that codon bias fails to predict pro- 
tein abundance. 

We have taken two statistical approaches. First, we have 
used the Spearman rank correlation coefficient (r s ). Since this 
statistic is nonparametric. there is no requirement for the data 
to be normally distributed. Using the r Jt we find that mRNA 
abundance is well correlated with protein abundance (r, = 
0.74), and the CAl is also well correlated with protein abun- 
dance (r 5 = 0.80) (and also with mRNA abundance [data not. 
shown]). For the data of Gygi et al. (10), we obtained similar 
results, though with their data the correlation is not as good; r s ' 
= .0.59 for the mRNA- to- protein correlation, and r s — 0.59 for 
the codon bias-to-protein correlation. 

In a second approach, we transformed the mRNA and pro- 
tein data to forms where they were normally distributed, to 
allow calculation of an r p (Materials and Methods). Two trans- 
formations, Box-Cox and logarithmic, were used; both gave 
good correlations wjih our daia [e.g.. r p = 0.76 for Ibg(adjusted 
RNA) to log(prolein)]. Wc were not able to transform the data 
of Gygi et al. to a normal distribution. 

Finally, there are also some differences in data between the ' 
two studies. These may be partly due to the different measure- 
ment techniques used: Gygi et al. measured protein abundance 
by cutting spots out of gels and measuring the radioactivity in 
each spot by scintillation counting, whereas we used phospho- 
rimaging of intact gels coupled to image analysis. We com- 
pared our data to theirs for the proieins common between the 
studies (but excluding proteins whose mRNAs are known lo 
differ between rich and minimal media, and excluding Pin, 
which was anomalous in dilleiing by ) 00-fold between the two 
data sets). The r s between the two protein data sets was 0.88 
{P < 0.0001). Although this is a strong correlation, the fact that 
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it is less than 3.0 suggests that there may have been errors in 
measuring protein abundance in one or both studies. After 
normalizing the two data sets to assume the same amount of 
protein per cell, we found a systematic tendency for the protein 
abundance data.of Gygi et al. to be slightly higher than ours for 
the highest-abundance proteins and also for the lowest-abun- 
dance proteins but slightly lower than ours for the middle- 
abundance proteins. These systematic differences suggest some 
systematic errors in protein measurement. Although we do. not 
know what the errors are, we suggest the following as a rea- 
sonable speculation. For the highest-abundance proteins, we 
may have underestimated the amount of protein because of a 
slightly nonlinear response of the phosphorimager screens. For 
the lowest-abundance proteins, Gygi et al. may have overesti- 
mated the amount of protein because of difficulties in accu- 
rately cutting very small spots out of the gel and because of 
difficulties in background subtraction for these small, weak 
spots. The difference in the middle abundance proteins may be 
a consequence of normalization, given the two errors above. 

The low-abundance proteins in the data set of Gygi et al. 
have a poor correlation with mRNA abundance. We calculate 
that the r, is 0.74 for the top 54 proteins of Gygi et al. but only 
0.22 for the bottom 53 proteins, a statistically significant dif- 
ference. However, with our data set, the r s is 0.62 for the top 33 
proteins and 0.56 (not significantly different) for the bottom 33 
proteins (which are comparable in abundance to the bottom 
53 proteins of Gygi el al.). Thus, our data set maintains a good 
correlation between mRNA and protein abundance even at 
low protein abundance. This is consistent with our speculation 
that protein quantification by phosphorimaging and image 
analysis may be more accurate for small, weak spots than is 
cutting out spots followed by scintillation counting. Our rela- 
tively good correlations even for nonabundanl proteins may 
also reflect the fact that we used both SAGE data and RNA 
hybridization data, which is most helpful for the least abundant 
mRNAs. In summary, we feel that the poor correlation of 
protein to mRNA for the nonabundant proteins of Gygi et al. 
may reflect dilficulty in accurately measuring these nonabun- 
dant proteins and mRNAs. rather than indicating a truly poor 
correlation in vivo. It is not surprising that observed correla- 
tions would be poorer with less-abundant proteins and 
mRNAs, simply because the accuracy of measurement would 
be worse. 

How well can mRNA abundance predict protein abun- 
dance? With r p = 0.76 for logarithmically transformed mRNA 
and protein data, the coefficient of determination, (r p ) 2 , is 0.58. 
This means that more than half (in log space) of the variation 
in protein abundance is explained by variation in mRNA abun- 
dance. When converted back to arithmetic values, protein 
abundances vary' over about 200-fold (Table J), and (r p ) 7 - 
0.58 for the log data means that of this 200-fold variation, 
about 20- fold is explained by variation in the abundance of 
mRNA and about 10-fold is unexplained (but could be due 
partly lo measurement errors). For proteins much less abun- 
dant than those considered here, we imagine the in vivo cor- 
relation between mRNA and protein abundance will be worse, 
and other regulatory mechanisms such as protein turnover will 
be more important. 

Some important conclusions can be drawn from this sam- 
pling of the proteome. First, there is an enormous range of 
protein abundance, from nearly 2,000,000 molecules per cell 
for some glycolytic enzymes to about 100 per cell for some cell 
cycle proteins (26a). Second, nboui hall of all cellular protein 
is found in fewer than J 00 different gene products, which are 
mostly involved in carbohydrate metabolism or protein synthc- 
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sis. Third, the correlation between protein abundance and CAI 
is Jog linear as far as we can see, which is from about 10,000 
protein molecules per cell to about 1,000,000. This is somewhat 
surprising, because it implies that selective forces for codon 
bias are significant even at moderate expression levels. It also 
means that codon bias is a useful predictor of protein abun- 
dance even for moderately low bias proteins. Fourth, there is a 
good correlation between protein abundance and mRNA 
abundance for the proteins that we have studied. This validates 
the use of mRNA abundance as a rough predictor of protein 
abundance, at least for relatively abundant proteins. Fifth, for 
these abundant proteins, there are about 4,000 molecules of 
protein for each molecule of mRNA. This last conclusion 
raises questions as to how the levels of nonabundarit proteins 
are regulated and suggests that protein instability, regulated 
translation, suboptimal rates of translation, and other mecha- 
nisms in addition to transcriptional control may be very impor- 
tant for these proteins. 

ACKNOWLEDGMENTS 

We thank Neena Sarcen and Nick Bizios (CSHL 2D ge| laboratory) 
for production of 2D gels, Tom Volpe for help with some experiments, 
Corine Driessens for help with calculations and statistics, and Herman 
Wijnen and Nick Edgingion for comments on the manuscript. We 
especially thank Tim Tully for in-deplh statistical analysis and for 
insightful discussions on statistical interpretations. 

This work was supported by grant P41-RR02188 from the N1H 
Biomedical Research Technology Program, Division of Research Re- 
sources, to J.I.G., by Small Business Innovation Research grant R44 
GM54110 to Proieome, Ina, by grant DAM Dl 7-94- J4050 from the 
Army Breast Cancer Program to B.F., and by NJH grant ROl 
CM45410 10 B.F. 

REFERENCES 

J. Baroni, M. D., E. Marlcgani, F. Monti, and L. Afbcrghina. 1939. Cell size 
modulation by CDC25 and R/\S2 genes in Saccharomyccs cerevisiae. Mol. 
Cell. Biol. 9:2715-2723. 

2. Boucherie, )L, F. Sagliocco, R. Joubert, I. Maiflet, J. Labarre, and M. Perrol. 
1996. Two-dimensional gel protein database of Saccharomyces cerevisiae. 
Electrophoresis 17:1683-1699. 

3. Elliott, and B. Fulrher. 199.1. Stress resistance of yeast cells is largely 
independent of cell cycle phase. Yeast 9:33-12. 

4. Enlinn, K. D., B. Meurrr, H. Kobler, K. H. Mann, and D. Mecke. 19S7. 
Studies on the regulation of enolases and compart men tat ion of cytosolic 
enzvmes in Saccharomyces cerevisiae. Biochim. Biophys. Acta 923:21+- 221. 

5. Ganiborn. A. J., D. W. Green, A. D. Hershey, R. M. Gould, and B. V. Plapp. 
1987. Kinetic characterization of yeast alcohol dehydrogenases. Amino acid 
residue 294 and substrate specificity. J. Biol. Chem. 262:3754-3761. 

6. Garrels, J. 1. 1989. The Quest system for quantitative analysis of two-dimen- 
sional scls. J. Biol. Chem. 264:5269^5282. 

7. Garret. J. I. f B. Futchrr, R. Kobayashi. G. 1. Latter, B. Schwtndcr, T. Volpe, 
J. R. Warner, and C- S. McLaughlin. 1994. Protein identifications for a 
Saccharomyccs cerevisiae protein database. Electrophoresis 15:1466-1486. 

8. Garrels, J. L, C. S. McLaughlin, J. R. Warner, B. Fulcbtr, G. L Latter, R. 
Kobayasbi, B. Sch>reoder, T. Volpe, D. S. Anderson, R. Mesqnita-Fuenles, 
and W. E. Paynr. 1997. Proieome studies o! S. cerevisiae: identification and 
characterization ol abundant proreins. Electrophoresis 18:1347-1360. 

9. Goffeau, A., B. G. Barrel!, II. Bu.ssey, R. W. Davis, B. Dojon, 11. Feldmann, 
F. Galibert. J. D. llobt isel. C. Jacq, M. Johnston, E. J. Louis, II. W. Mewts, 



Mol Cell. Biol. 

Y. Murakami, P. Philippsen, H. Tettelin, and S. G. Oliver. 1996. Life with 
6000 genes. Science 274:563-567. . 
. 10. Gygi, S- P., Y. Rochon, B. R. Franza, and R. Aebersold. 1999. Correlation 
between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19:1720- 
1730. 

I). Hereford, U M, and M. Rosbash. 1977. Number and distribution of pory- 
adenylated RNA sequences in yeast. Cell 10:453-462. 

12. Herrick, D., R. Parker, and A. Jacobson. 1990. Identification and comparison 
Of stable and unstable mRNAs in Saccharomyces cerevisiae. Mol. Cell. Biol. 
10:2269-2284. 

13. Hodges, P. E^ A. H. McKee, B. P. Davis, >V. E. Payne, and J. L Garrels. 1999. 
The Yeasl Proieome Database (YPD): a model for the organization of 
genome-wide functional data. Nucleic Acids Res. 27:69-73. 

14. Jkemura, T. 1985. Codon usage and tRNA content in unicellular and mul- 
ticellular organisms. Mol. Biol. Evol. 2:13-34. 

15. Johnston, G. C, F. R. Pringle, and L. H. Harlwell. 1977. Coordinalion of 
growth with cell division in the yeast S. cerevisiae. Exp. Cell Res. 105:79-981 

16. Johnston, M., and M. Carlson. 1992. Regulation of carbon and phosphate 
utilization, p. 193-281. In E. Jones, J. Pringle, and J. Broach (cd.), The 
molecular and cellular biology of the yeast Saccharomyces. Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y. 

17. Kornblatl, M. J, and A. Klugerman. 198.9. Characterization of the enolase 
isozymes of rabbit brain: kinetic differences between mammalian and yeast 
enolases. Biochem. Cell. Biol. 67:103-107. 

17a.Latler, G., and B. Fulcber. Unpublished data. 

18. Mathews, B., N. Sonenberg, and J. >Y. B. Hershey. 1996. Origins and targets 
of translational control, p. 1-29: in J. W. B. Hershey, M. B. Mathews, and N. 
Sonenberg (ed.), Translational control. Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, N.Y. 

19. McAlisler, JL, and M. J. Holland. 1982. Targeted deletion of a yeast enolase 
structural gene. Identification and isolation of yeast enolase isozymes. 
J. Biol. Chem. 257:7181-7188. 

20. Monardo, P. J., T. Bouttll, J. J. Garrels, and G. 1. Latter. 1994. A distributed 
system for two-dimensional gel analysis. Comput. Appl. Biosci 10:137-143. 

21. O'Farrell, P. H. 1975. High resolution two-dimensional electrophoresis of 
proteins. J. Biol. Chem. 250:4007-4021. 

22. Patterson, S. D., and G. I. J,alier. 1993. Evaluation of storage phosphor 
imaging for quantitative analysis of 2-D gels using the Quest 11 system. 
BioTechniqucs 15:1076-1083. 

23. Sagliocco, F., J. C. Guillemot, C. Monribot, J. Capdcviclle, M. Perrot, E. 
Ferran, P. Ferrara, and II. Boucherie. 1996. Identification of proteins of the 
yeast protein map using genetically manipulated strains and peptide-mass 
fingerprinting. Yeast 12:1519-1533. 

24. Sharp, P. M., and >Y. 11. Li. 1987. The Codon Adaptation Index— a measure 
of directional synonymous codon usage bias, and its potential applications. 
Nucleic Acids Res. 15:281-1295. 

25. Sbevchenko, A., O. N. Jensen, A. V. Podlelcjnikov, F. Sagliocco, M. Wilro, O. 
Vorm, P. Morlensen, A. Sbevchenko, 11. Boucherie, and M. Mann. 1996. 
Linking genome and proteome by mass spectrometry: large-scale identifica- 
lion of yeast proteins from two dimensional gels. Proc. Nail. Acad. Sci. USA 
93:14440-14445. 

26. Thomas, B. J., and R. Rothslein. 1989. Elevated recombinaiion rates in 
transcriptionally active DNA. Cell 56:619-630. 

26a.Tyers. M., and B. Futcher. Unpublished data. 

27. Velrulescu, V. R, L. Zhang, W. Zhou. J. Vogelslein, M. A. Basrai, D. E. 
Basseti, Jr.. P. Ilieler, B. Vogelstein, and K. W. Kinzlrr. 1997. Character- 
izaiion of the yeasl Iranscriptome. Cell 88:243-251. 

2S. Warner, J. 1991. Labeling of RNA and phosphoproteias in S. cerevisiae. 
Methods Enzymol. 194:423-428. 

29. Wills, C. 1976. Production of yeast alcohol dehydrogenase isoenzymes by 
selection. Nature 261:26-29. 

29a.Wodicka, L. Personal communication. 
29b.Wodicka, L. Unpublished data. 

30. Wodicka, L., II. Dong, M. Mitlmnnn, M.-H. )lo, and D. J. Lockharl. 1997. 
Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat. Bio- 
technol. 15:1359-1367. 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 



□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 



□/REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 
□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




BLACK BORDERS 




INES OR MARKS ON ORIGINAL DOCUMENT 



